Statistical Machine Translation of Broadcast News from Spanish to Portuguese
نویسندگان
چکیده
In this paper we describe the work carried out to develop an automatic system for translation of broadcast news from Spanish to Portuguese. Two challenging topics of speech and language processing were involved: Automatic Speech Recognition (ASR) of the Spanish News and Statistical Machine Translation (SMT) of the results to the Portuguese language. ASR of broadcast news is based on the AUDIMUS.MEDIA system, a hybrid ANN/HMM system with multiple stream decoding. A 22.08% Word Error Rate (WER) was achieved in a Spanish Broadcast News task, which is comparable to other international state of the art systems. Parallel normalized texts from European Parliament database were used to train the SMT system from Spanish to Portuguese. Preliminary non-exhaustive human evaluation showed a fluency of 3.74 and sufficiency of 4.23.
منابع مشابه
The ISL Statistical Machine Translation System for the TC-STAR Spring 2006 Evaluation
In this paper we describe the ISL statistical machine translation system used in the TC-STAR Spring 2006 Evaluation campaign. This system is based on PESA phrase-to-phrase translations which are extracted from a bilingual corpus. The translation model, language model and other features are combined in a log-linear model during decoding. We participated in the Spanish Parliament (Cortes) and Eur...
متن کاملFully Automatic Compilation of Portuguese-English and Portuguese-Spanish Parallel Corpora
This paper reports the fully automatic compilation of parallel corpora for Brazilian Portuguese. Scientific news texts available in Brazilian Portuguese, English and Spanish are automatically crawled from a multilingual Brazilian magazine. The texts are then automatically aligned at documentand sentence-level. The resulting corpora contain about 2,700 parallel documents totaling over 150,000 al...
متن کاملThe need to create a media block for the convergence of overseas news networks
As a general diplomacy arm of the Islamic Republic of Iran, VoSiMa has extensive activities in international broadcasting of its radio and television programs. These programs are broadcast in different languages, such as English, French, Azeri, Arabic, and ... for regional and transnational audiences. The large volume of the organization's international activities is in the form of news and new...
متن کاملAn Experiment in Spanish-Portuguese Statistical Machine Translation
Statistical approaches to machine translation have long been successfully applied to a number of ‘distant’ language pairs such as English-Arabic and English-Chinese. In this work we describe an experiment in statistical machine translation between two ‘related’ languages: European Spanish and Brazilian Portuguese. Preliminary results suggest not only that statistical approaches are comparable t...
متن کاملImproving English-Spanish Statistical Machine Translation: Experiments in Domain Adaptation, Sentence Paraphrasing, Tokenization, and Recasing
We describe the experiments of the UC Berkeley team on improving English-Spanish machine translation of news text, as part of the WMT’08 Shared Translation Task. We experiment with domain adaptation, combining a small in-domain news bi-text and a large out-of-domain one from the Europarl corpus, building two separate phrase translation models and two separate language models. We further add a t...
متن کامل